122 research outputs found

    Untranslated Parts of Genes Interpreted: Making Heads or Tails of High-Throughput Transcriptomic Data via Computational Methods Computational methods to discover and quantify isoforms with alternative untranslated regions

    Get PDF
    In this review we highlight the importance of defining the untranslated parts of transcripts, and present a number of computational approaches for the discovery and quantification of alternative transcription start and poly‐adenylation events in high‐throughput transcriptomic data. The fate of eukaryotic transcripts is closely linked to their untranslated regions, which are determined by the position at which transcription starts and ends at a genomic locus. Although the extent of alternative transcription starts and alternative poly‐adenylation sites has been revealed by sequencing methods focused on the ends of transcripts, the application of these methods is not yet widely adopted by the community. We suggest that computational methods applied to standard high‐throughput technologies are a useful, albeit less accurate, alternative to the expertise‐demanding 5′ and 3′ sequencing and they are the only option for analysing legacy transcriptomic data. We review these methods here, focusing on technical challenges and arguing for the need to include better normalization of the data and more appropriate statistical models of the expected variation in the signal

    KSHV SOX mediated host shutoff: the molecular mechanism underlying mRNA transcript processing

    Get PDF
    Onset of the lytic phase in the KSHV life cycle is accompanied by the rapid, global degradation of host (and viral) mRNA transcripts in a process termed host shutoff. Key to this destruction is the virally encoded alkaline exonuclease SOX. While SOX has been shown to possess an intrinsic RNase activity and a potential consensus sequence for endonucleolytic cleavage identified, the structures of the RNA substrates targeted remained unclear. Based on an analysis of three reported target transcripts, we were able to identify common structures and confirm that these are indeed degraded by SOX in vitro as well as predict the presence of such elements in the KSHV pre-microRNA transcript K12-2. From these studies, we were able to determine the crystal structure of SOX productively bound to a 31 nucleotide K12-2 fragment. This complex not only reveals the structural determinants required for RNA recognition and degradation but, together with biochemical and biophysical studies, reveals distinct roles for residues implicated in host shutoff. Our results further confirm that SOX and the host exoribonuclease Xrn1 act in concert to elicit the rapid degradation of mRNA substrates observed in vivo, and that the activities of the two ribonucleases are co-ordinated

    Quantitative global studies of reactomes and metabolomes using a vectorial representation of reactions and chemical compounds

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Global studies of the protein repertories of organisms are providing important information on the characteristics of the protein space. Many of these studies entail classification of the protein repertory on the basis of structure and/or sequence similarities. The situation is different for metabolism. Because there is no good way of measuring similarities between chemical reactions, there is a barrier to the development of global classifications of "metabolic space" and subsequent studies comparable to those done for protein sequences and structures.</p> <p>Results</p> <p>In this work, we propose a vectorial representation of chemical reactions, which allows them to be compared and classified. In this representation, chemical compounds, reactions and pathways may be represented in the same vectorial space. We show that the representation of chemical compounds reflects their physicochemical properties and can be used for predictive purposes. We use the vectorial representations of reactions to perform a global classification of the reactome of the model organism <it>E. coli</it>.</p> <p>Conclusions</p> <p>We show that this unsupervised clustering results in groups of enzymes more coherent in biological terms than equivalent groupings obtained from the EC hierarchy. This hierarchical clustering produces an optimal set of 21 groups which we analyzed for their biological meaning.</p

    Cmr is a redox-responsive regulator of DosR that contributes to M. tuberculosis virulence.

    Get PDF
    Mycobacterium tuberculosis (MTb) is the causative agent of pulmonary tuberculosis (TB). MTb colonizes the human lung, often entering a non-replicating state before progressing to life-threatening active infections. Transcriptional reprogramming is essential for TB pathogenesis. In vitro, Cmr (a member of the CRP/FNR super-family of transcription regulators) bound at a single DNA site to act as a dual regulator of cmr transcription and an activator of the divergent rv1676 gene. Transcriptional profiling and DNA-binding assays suggested that Cmr directly represses dosR expression. The DosR regulon is thought to be involved in establishing latent tuberculosis infections in response to hypoxia and nitric oxide. Accordingly, DNA-binding by Cmr was severely impaired by nitrosation. A cmr mutant was better able to survive a nitrosative stress challenge but was attenuated in a mouse aerosol infection model. The complemented mutant exhibited a ∼2-fold increase in cmr expression, which led to increased sensitivity to nitrosative stress. This, and the inability to restore wild-type behaviour in the infection model, suggests that precise regulation of the cmr locus, which is associated with Region of Difference 150 in hypervirulent Beijing strains of Mtb, is important for TB pathogenesis

    Multi-scale sequence correlations increase proteome structural disorder and promiscuity

    Full text link
    Numerous experiments demonstrate a high level of promiscuity and structural disorder in organismal proteomes. Here we ask the question what makes a protein promiscuous, i.e., prone to non-specific interactions, and structurally disordered. We predict that multi-scale correlations of amino acid positions within protein sequences statistically enhance the propensity for promiscuous intra- and inter-protein binding. We show that sequence correlations between amino acids of the same type are statistically enhanced in structurally disordered proteins and in hubs of organismal proteomes. We also show that structurally disordered proteins possess a significantly higher degree of sequence order than structurally ordered proteins. We develop an analytical theory for this effect and predict the robustness of our conclusions with respect to the amino acid composition and the form of the microscopic potential between the interacting sequences. Our findings have implications for understanding molecular mechanisms of protein aggregation diseases induced by the extension of sequence repeats

    Physiochemical property space distribution among human metabolites, drugs and toxins

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The current approach to screen for drug-like molecules is to sieve for molecules with biochemical properties suitable for desirable pharmacokinetics and reduced toxicity, using predominantly biophysical properties of chemical compounds, based on empirical rules such as Lipinski's "rule of five" (Ro5). For over a decade, Ro5 has been applied to combinatorial compounds, drugs and ligands, in the search for suitable lead compounds. Unfortunately, till date, a clear distinction between drugs and non-drugs has not been achieved. The current trend is to seek out drugs which show metabolite-likeness. In identifying similar physicochemical characteristics, compounds have usually been clustered based on some characteristic, to reduce the search space presented by large molecular datasets. This paper examines the similarity of current drug molecules with human metabolites and toxins, using a range of computed molecular descriptors as well as the effect of comparison to clustered data compared to searches against complete datasets.</p> <p>Results</p> <p>We have carried out statistical and substructure functional group analyses of three datasets, namely human metabolites, drugs and toxin molecules. The distributions of various molecular descriptors were investigated. Our analyses show that, although the three groups are distinct, present-day drugs are closer to toxin molecules than to metabolites. Furthermore, these distributions are quite similar for both clustered data as well as complete or unclustered datasets.</p> <p>Conclusion</p> <p>The property space occupied by metabolites is dissimilar to that of drugs or toxin molecules, with current drugs showing greater similarity to toxins than to metabolites. Additionally, empirical rules like Ro5 can be refined to identify drugs or drug-like molecules that are clearly distinct from toxic compounds and more metabolite-like. The inclusion of human metabolites in this study provides a deeper insight into metabolite/drug/toxin-like properties and will also prove to be valuable in the prediction or optimization of small molecules as ligands for therapeutic applications.</p

    Evolutionarily Conserved Substrate Substructures for Automated Annotation of Enzyme Superfamilies

    Get PDF
    The evolution of enzymes affects how well a species can adapt to new environmental conditions. During enzyme evolution, certain aspects of molecular function are conserved while other aspects can vary. Aspects of function that are more difficult to change or that need to be reused in multiple contexts are often conserved, while those that vary may indicate functions that are more easily changed or that are no longer required. In analogy to the study of conservation patterns in enzyme sequences and structures, we have examined the patterns of conservation and variation in enzyme function by analyzing graph isomorphisms among enzyme substrates of a large number of enzyme superfamilies. This systematic analysis of substrate substructures establishes the conservation patterns that typify individual superfamilies. Specifically, we determined the chemical substructures that are conserved among all known substrates of a superfamily and the substructures that are reacting in these substrates and then examined the relationship between the two. Across the 42 superfamilies that were analyzed, substantial variation was found in how much of the conserved substructure is reacting, suggesting that superfamilies may not be easily grouped into discrete and separable categories. Instead, our results suggest that many superfamilies may need to be treated individually for analyses of evolution, function prediction, and guiding enzyme engineering strategies. Annotating superfamilies with these conserved and reacting substructure patterns provides information that is orthogonal to information provided by studies of conservation in superfamily sequences and structures, thereby improving the precision with which we can predict the functions of enzymes of unknown function and direct studies in enzyme engineering. Because the method is automated, it is suitable for large-scale characterization and comparison of fundamental functional capabilities of both characterized and uncharacterized enzyme superfamilies

    Hydrophobicity and Charge Shape Cellular Metabolite Concentrations

    Get PDF
    What governs the concentrations of metabolites within living cells? Beyond specific metabolic and enzymatic considerations, are there global trends that affect their values? We hypothesize that the physico-chemical properties of metabolites considerably affect their in-vivo concentrations. The recently achieved experimental capability to measure the concentrations of many metabolites simultaneously has made the testing of this hypothesis possible. Here, we analyze such recently available data sets of metabolite concentrations within E. coli, S. cerevisiae, B. subtilis and human. Overall, these data sets encompass more than twenty conditions, each containing dozens (28-108) of simultaneously measured metabolites. We test for correlations with various physico-chemical properties and find that the number of charged atoms, non-polar surface area, lipophilicity and solubility consistently correlate with concentration. In most data sets, a change in one of these properties elicits a ∼100 fold increase in metabolite concentrations. We find that the non-polar surface area and number of charged atoms account for almost half of the variation in concentrations in the most reliable and comprehensive data set. Analyzing specific groups of metabolites, such as amino-acids or phosphorylated nucleotides, reveals even a higher dependence of concentration on hydrophobicity. We suggest that these findings can be explained by evolutionary constraints imposed on metabolite concentrations and discuss possible selective pressures that can account for them. These include the reduction of solute leakage through the lipid membrane, avoidance of deleterious aggregates and reduction of non-specific hydrophobic binding. By highlighting the global constraints imposed on metabolic pathways, future research could shed light onto aspects of biochemical evolution and the chemical constraints that bound metabolic engineering efforts

    The Impact of Multifunctional Genes on "Guilt by Association" Analysis

    Get PDF
    Many previous studies have shown that by using variants of “guilt-by-association”, gene function predictions can be made with very high statistical confidence. In these studies, it is assumed that the “associations” in the data (e.g., protein interaction partners) of a gene are necessary in establishing “guilt”. In this paper we show that multifunctionality, rather than association, is a primary driver of gene function prediction. We first show that knowledge of the degree of multifunctionality alone can produce astonishingly strong performance when used as a predictor of gene function. We then demonstrate how multifunctionality is encoded in gene interaction data (such as protein interactions and coexpression networks) and how this can feed forward into gene function prediction algorithms. We find that high-quality gene function predictions can be made using data that possesses no information on which gene interacts with which. By examining a wide range of networks from mouse, human and yeast, as well as multiple prediction methods and evaluation metrics, we provide evidence that this problem is pervasive and does not reflect the failings of any particular algorithm or data type. We propose computational controls that can be used to provide more meaningful control when estimating gene function prediction performance. We suggest that this source of bias due to multifunctionality is important to control for, with widespread implications for the interpretation of genomics studies
    corecore